Model Selection

Multimodal document retrieval

# Multimodal document retrieval

Holo1-3B is a multimodal model based on the Transformer architecture, focusing on visual document retrieval tasks and performing excellently in the WebVoyager benchmark test, balancing accuracy and cost.

Transformers English

The Holo1-7B GGUF model is part of the Surfer-H system and is suitable for multimodal tasks such as visual document retrieval. It is particularly good at web page interaction and network monitoring, and can achieve high accuracy at a low cost.

Transformers English

Granite Vision 3.3 2b Embedding

An efficient embedding model built on granite-vision-3.3-2b, designed for multimodal document retrieval and capable of processing documents containing tables, charts, infographics, and complex layouts.

Multimodal Fusion

Transformers English

Colqwen2 2b V1.0

A visual retrieval model based on Qwen2-VL-2B-Instruct and ColBERT strategy, capable of generating multi-vector text and image representations

Text-to-Image Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase